The LIMSI SDR System for TREC-9
نویسندگان
چکیده
In this paper we describe the LIMSI Spoken Document Retrieval system used in the TREC-9 evaluation. This system combines an adapted version of the LIMSI 1999 Hub-4E transcription system for speech recognition with text-based IR methods. Compared with the LIMSI TREC-8 system, this year’s system is able to index the audio data without knowledge of the story boundaries using a double windowing approach. The query expansion procedure of the information retrieval component has been revised and makes use of contemporaneous text sources. Experimental results are reported in terms of mean average precision for both the TREC SDR’99 and SDR’00 queries using the same 557h data set. The mean average precision of this year’s system is 0.5250 for SDR’99 and 0.3706 for SDR’00 for the focus unknown story boundary condition with a 20% word error rate.
منابع مشابه
The LIMSI SDR System for TREC-8
In this paper we report on our TREC-8 SDR system, which combines an adapted version of the LIMSI 1998 Hub-4E transcription system for speech recognition with an IR system based on the Okapi term weighting function. Experimental results are given in terms of word error rate and average precision for both the SDR’98 and SDR’99 data sets. In addition to the Okapi approach, we also investiged a Mar...
متن کاملThe Thisl SDR System at TREC-9
This paper describes our participation in the TREC-9 Spoken Document Retrieval (SDR) track. The THISL SDR system consists of a realtime version of a hybrid connectionist/HMM large vocabulary speech recognition system and a probabilistic text retrieval system. This paper describes the configuration of the speech recognition and text retrieval systems, including segmentation and query expansion. ...
متن کاملAT&T at TREC-8
In 1999, AT&T participated in the ad-hoc task and the Question Answering (QA), Spoken Document Retrieval (SDR), and Web tracks. Most of our e ort for TREC-8 focused on the QA and SDR tracks. Results from SDR track show that our document expansion techniques, presented in [8, 9], are very e ective for speech retrieval. The results for question answering are also encouraging. Our system designed ...
متن کاملSpoken Document Retrieval for TREC-8 at Cambridge University
This paper presents work done at Cambridge University on the TREC-8 Spoken Document Retrieval (SDR) Track. The 500 hours of broadcast news audio was filtered using an automatic scheme for detecting commercials, and then transcribed using a 2-pass HTK speech recogniser which ran at 13 times real time. The system gave an overall word error rate of 20.5% on the 10 hour scored subset of the corpus,...
متن کاملTREC-6 1997 Spoken Document Retrieval Track Overview and Results
This paper describes the 1997 TREC-6 Spoken Document Retrieval (SDR) Track which implemented a first evaluation of retrieval of broadcast news excerpts using a combination of automatic speech recognition and information retrieval technologies. The motivations behind the SDR Track and background regarding its development and implementation are discussed. The SDR evaluation collection and topics ...
متن کامل